Categories

Academia (6)Actors (5)Adversarial Training (7)Agency (6)Agent Foundations (20)AGI (19)AGI Fire Alarm (3)AI Boxing (2)AI Takeoff (8)AI Takeover (6)Alignment (4)Alignment Proposals (10)Alignment Targets (4)Anthropic (1)ARC (3)Autonomous Weapons (1)Awareness (6)Benefits (2)Brain-based AI (3)Brain-computer Interfaces (1)CAIS (2)Capabilities (20)Careers (14)Catastrophe (29)CHAI (1)CLR (1)Cognition (5)Cognitive Superpowers (9)Coherent Extrapolated Volition (2)Collaboration (6)Community (10)Comprehensive AI Services (1)Compute (8)Consciousness (5)Content (2)Contributing (29)Control Problem (7)Corrigibility (8)Deception (5)Deceptive Alignment (8)Decision Theory (5)DeepMind (4)Definitions (86)Difficulty of Alignment (8)Do What I Mean (2)ELK (3)Emotions (1)Ethics (7)Eutopia (5)Existential Risk (29)Failure Modes (13)FAR AI (1)Forecasting (7)Funding (10)Game Theory (1)Goal Misgeneralization (14)Goodhart's Law (3)Governance (26)Government (3)Hedonium (1)Human Level AI (5)Human Values (11)Inner Alignment (10)Instrumental Convergence (5)Intelligence (15)Intelligence Explosion (7)International (3)Interpretability (17)Inverse Reinforcement Learning (1)Language Models (13)Literature (4)Living document (2)LLM (9)Machine Learning (16)Maximizers (1)Mentorship (8)Mesa-optimization (6)MIRI (2)Misuse (4)Multipolar (4)Narrow AI (4)Objections (59)Open AI (2)Open Problem (4)Optimization (4)Organizations (15)Orthogonality Thesis (3)Other Concerns (8)Outcomes (5)Outer Alignment (14)Outreach (5)People (4)Philosophy (5)Pivotal Act (1)Plausibility (5)Power Seeking (5)Productivity (6)Prosaic Alignment (7)Quantilizers (2)Race Dynamics (6)Ray Kurzweil (1)Recursive Self-improvement (6)Regulation (3)Reinforcement Learning (13)Research Agendas (25)Research Assistants (1)Resources (20)Robots (7)S-risk (6)Sam Bowman (1)Scaling Laws (5)Selection Theorems (1)Singleton (3)Specification Gaming (10)Study (13)Superintelligence (33)Technological Unemployment (1)Technology (3)Timelines (14)Tool AI (2)Transformative AI (4)Transhumanism (2)Types of AI (2)Utility Functions (3)Value Learning (5)What About (9)Whole Brain Emulation (6)Why Not Just (15)

Outer Alignment

14 pages tagged "Outer Alignment"

Isn’t AI just a tool like any other? Won’t it just do what we tell it to?

Could we tell the AI to do what's morally right?

Can you give an AI a goal which involves “minimally impacting the world”?

At a high level, what is the challenge of AI alignment?

Why can’t we just use Asimov’s Three Laws of Robotics?

Why can't we just make a "child AI" and raise it?

What is the difference between inner and outer alignment?

What is "coherent extrapolated volition (CEV)"?

What is "Do what I mean"?

Which moral theories would be easiest to encode into an AI?

What are "true names" in the context of AI alignment?

What is imitation learning?

What is reward hacking?

What is outer alignment?

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.

Get involved

Join us on Discord

Partner projects

Alignment Ecosystem Development

© AISafety.info, 2022—1970

Aisafety.info is an Ashgro Inc Project. Ashgro Inc (EIN: 88-4232889) is a 501(c)(3) Public Charity incorporated in Delaware.